GRAB - Inverted Indexes with Low Storage Overhead

نویسنده

  • Michael E. Lesk
چکیده

A searching command (grab) for maintaining indexes combines acceptably fast searching with very low storage overhead. It looks like grep except that it demands a preindexing pass, looks only for whole words, and runs faster. As an example of performance, consider the time to search for single words in a 7.8 Mbyte file (the Brown corpus of English). The times below are in seconds on a DEC 8600 running Ultrix; the space overhead is given as a percentage ofthe original file. word No. uses tt3?HPu* t,,1"*t0""* ttåJiil.. Shakespeare Dickens Chaucer 29

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Building Space-Efficient Inverted Indexes on Low-Cardinality Dimensions

Many modern applications naturally lead to the implementation of inverted indexes for effectively managing large collections of data items. Creating an inverted index on a low cardinality data domain results in replication of data descriptors, leading to increased storage overhead. For example, the use of RFID or similar sensing devices in supply-chains results in massive tracking datasets that...

متن کامل

Efficient Phrase Querying with an Auxiliary Index

Search engines need to evaluate queries extremely fast, a challenging task given the vast quantities of data being indexed. A significant proportion of the queries posed to search engines involve phrases. In this paper we consider how phrase queries can be efficiently supported with low disk overheads. Previous research has shown that phrase queries can be rapidly evaluated using nextword index...

متن کامل

Efficient Phrase Querying with an Auxiliary Index

Search engines need to evaluate queries extremely fast, a challenging task given the vast quantities of data being indexed. A significant proportion of the queries posed to search engines involve phrases. In this paper we consider how phrase queries can be efficiently supported with low disk overheads. Previous research has shown that phrase queries can be rapidly evaluated using nextword index...

متن کامل

Taming Hot-Spots in DHT Inverted Indexes

DHT systems are structured overlay networks capable of using P2P resources as a scalable platform for very large data storage applications. However, their efficiency expects a level of uniformity in the association of data to index keys that is often not present in inverted indexes. Index data tends to follow nonuniform distributions, often power law distributions, creating intense local storag...

متن کامل

Efficient Query Processing on Term-Based-Partitioned Inverted Indexes

In a shared-nothing, parallel text retrieval system, queries are processed over an inverted index that is partitioned among a number of index servers. In practice, the inverted index is either document-based or term-based partitioned, depending on properties of the underlying hardware infrastructure, query traffic, and some performance and availability constraints. In query processing on term-b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computing Systems

دوره 1  شماره 

صفحات  -

تاریخ انتشار 1988